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Abstract — Several studies have identified a significant amount 
of redundancy in tlie networli traffic. For example, it is demon- 
strated tliat tliere is a great amount of redundancy witliin tfie 
content of a server over time. Tliis redundancy can be leveraged 
to reduce the network flow by the deployment of memory units 
in the network. The question that arises is whether or not the 
deployment of memory can result in a fundamental improvement 
in the performance of the network. In this paper, we answer 
this question affirmatively by first establishing the fundamental 
gains of memory-assisted source compression and then applying 
the technique to a network. Specifically, we investigate the gain 
of memory-assisted compression in random network graphs 
consisted of a single source and several randomly selected 
memory units. We find a threshold value for the number of 
memories deployed in a random graph and show that if the 
number of memories exceeds the threshold we observe network- 
wide reduction in the traffic. 

I. Introduction 

Several studies have demonstrated the existence of consider- 
able amount of redundancy in the Internet data traffic, where a 
few major dimensions have been identified as the main sources 
of redundancy in the network traffic. For example, the contents 
of a web server contains more than 60% redundant data on 
the average within a ten-day period HI. Further, there are 
some popular files in each server that may be requested by 
several clients in the network. This redundancy in the data 
may be leveraged to reduce the communication within the 
network. The existing redundancy elimination techniques are 
mostly based on end-to-end caching mechanisms, where the 
redundant content is only cached in the server and the client for 
future reference However, these end-to-end approaches do 
not efficiently leverage the redundancy in the network because 
there is no memorization in today's Internet except end-to-end 
caching mechanisms HI. 

Recently, a few studies considered the deployment of re- 
dundancy elimination techniques within the network 13], H, 
where the intermediate nodes in the network have been as- 
sumed to be capable of caching of the previous communication 
and processing of the data. These works studied the network 
flow reduction via ad-hoc solutions such as deduplication of 
the repeated segments of the traffic without any connection to 
the information theory. The objective of this paper is to study 
this problem from an information theoretic point of view. We 
assume that some intermediate nodes (referred to as memory 
nodes) are capable of the memorization (to be defined later) 
of the previous communications which have passed through 
them. We further assume that the memorized content will be 



used in a memory-assisted source coding in the network, which 
we refer to as network flow compression with memory. How- 
ever, several questions remain open regarding memory-assisted 
compression of the network flow. Does the deployment of 
memory in the network provide any fundamental benefit over 
end-to-end solutions? How much saving could be achieved 
using network compression? How can the savings be achieved? 

This paper attempts to answer the above fundamental ques- 
tions. To the best of our knowledge, this is the first work which 
addresses the memory-assisted redundancy elimination of the 
flow from the information theoretic point of view. The data 
that is transmitted inside the network have different spatial 
and temporal probability distributions. Thus, prior knowledge 
of the probability distributions underlying the contents may 
not be assumed. Hence, one important characteristic of any 
compression solution is that it must be universal in the sense 
that it must be able to remove redundancy without knowing 
the statistics and nature of the data jS), ||6|. 

In this paper, we focus our study on the fairly broad class 
of parametric information sources, which include the class 
of Markov sources of any finite order |7|. We formulate the 
problem as memory-assisted compression of the network flow, 
where the redundancy in the traffic data is to be removed. 
In this context, the redundancy elimination could be viewed 
as universal source coding, where the goal is to represent 
the data with a minimum length codeword Q. However, as 
we will discuss in Sec. [Ill this problem is different from 
the distributed source coding problem because of memory 
units. We investigate the fundamental gain of the memory- 
assisted network flow compression over end-to-end universal 
compression techniques, where the content is compressed at 
the server and routed via routers without any memorization 
inside the network. Throughout this paper, we focus on the 
problem involving a single source (content server) that is fixed 
in the network and extensions to multiple sources is left as 
future work. 

In what follows, we first describe the memory-assisted 
source coding in Sec. |ll] The memory deployment problem 
and the gain of memorization in networks is discussed in 
Sec. Hn] In Sec. |IV] we study memory-assisted network flow 
compression on Erdos-Renyi (ER) random graphs and find 
a threshold value for the number of memory units. Finally, 
simulation results are provided in Sec. |V] 



II. Memory-Assisted Source Coding 

In what follows, we introduce the memory-assisted source 
coding via the sample network depicted in Fig [T] which is 
consisted of a server S, a memory unit (which also acts 
as a router), and the clients Ci and C2. We assume that 
the communication is as follows. First, client Ci, who has 
not previously communicated with the server, acquires the 
sequence /i from the server through the intermediate node /i. 
Next, client C2, who also has not previously communicated 
with S, acquires the sequence /2 from S through /i. In order to 
show the benefits of the deployment of memory in the router, 
we compare three schemes: 

• NcompNmem (No compression with no memory), which 
does not apply any compression and does not utilize the 
memory unit. 

• UcompNmem (Universal compression with no memory), 
which only applies end-to-end universal source coding at 
the source without using the memory unit. 

• UcompWmem (Universal compression with memory), 
which assumes that the router has memory and utilizes 
the memory unit when compressing the data at the source. 

In all of the above scenarios, we assume that the client has 
no previous communication with the server since it is usually 
the case in networks. On the other hand, the memory/router 
is capable of memorizing the communication of /i between 
S and Ci in order to better compress /2 (on the link from 
S to ji) that is then being delivered to C2. We will later 
demonstrate that even if sequences /i and /2 are independent 
given that the source model is known, the memorization of 
/i in /i can result in the reduction of the communication 
for the transfer of /2 from S to C2. This seemingly counter 
intuitive phenomenon is due to the fact that the source model 
is not known a priori (at /i). The underlying source coding 
must be universal, which imposes a compression overhead 
when the length of the sequence is finite (small) [7|. On the 
other hand, sequence /i does indeed contain some information 
about the unknown source parameters to the extent that an 
infinite length sequence /i can be used to identify all of the 
unknown parameters of the source. This side information can 
be memorized at the memory unit /i and the source S for the 
compression of /2. Then the memory unit can decode /2 using 
the side information and send /2 to C2. It is important to note 
that the saving of memory-assisted compression in terms of 
flow reduction is observed in the 5 — /i link. For example, if /i 
and /2 are unit size and the memorization helps to compress 
/2 by a factor of 2, the total flow is reduced from 1 + 1, to 
0.5 + 1 bitxhop, where there is a gain 2 in the link between 
S and /i. 

While relevant, the memory-assisted source coding problem 
is different from those addressed by distributed source com- 
pression techniques (i.e., the Slepian Wolf problem) that target 
multiple correlated sources sending information to the same 
destination jS), Q. As described in the above example, the 
memory-assisted source coding gain is due to the fact that the 
source parameter is unknown. Therefore, when the length of 




Fig. 1. The basic memory-assisted network flow compression scenario. 



the sequence /2 increases, the memory-assisted source coding 
gain, with respect to UcomNmem, vanishes since the source 
parameter can be well estimated using the sequence, and 
hence, /2 can be compressed to fundamental limit (i.e., entropy 
rate). On the other hand, in the Slepian- Wolf, the gains are 
achievable in the asymptotic regime. Further, the memorization 
of a sequence /i that is independent of the sequence /2 can 
result in a gain in memory-assisted compression of /2 whereas, 
in the Slepian- Wolf problem, the gain is due to the bit by bit 
correlation between the two sequences. The memory-assisted 
compression problem is also distinct from the lossless source 
coding techniques such as LZ and CTW that simply remove 
redundancy in a single piece of content without regard to any 
memorization at the intermediate node /J, Q, |l6l. 

Next, we characterize the benefits of network memory in 
the context of a universal source coding problem for the 
class of smooth parametric sources |71, |10|. Let ^(C„,x") = 
Z„(a;") denote the length function that describes the codeword 
associated with the sequence x". The expected codeword 
length E^„(X") quantifies the compression performance of 
UcompNmem, where E[-] denotes the expectation operator 
For an asymptotically optimal code in the sense that it achieves 
the entropy rate, we have i(E/„(X") - _ff„(X")) as 
n — 5- cxo, where i/„(X") denotes the entropy of a sequence of 
length n. In the case that the router is capable of memorization 
(UcompWmem), assume that is another sequence of length 
m from the same source that generates x". By context mem- 
orization, we mean that both the source S and the memory 
unit /i have akeady visited the sequence y™. Let ln\ra be a 
regular length function where S and ^ have access to a context 
memory of length m from the previous communications. 
Then, the expected codeword length with memory E/„|m(X") 
characterizes the compression performance of UcompWmem 
for x" in the link from S to ^, in Fig [1] 

Let Q{lmln\m) be defined as the ratio of the expected 
codeword length of UcompNmem to that of UcompWmem 
as 

nil 1 \ ^ ^^niX"^) 

''n\m) 



E;„|„(X")- 



(1) 



Further, let e be a real number such that < e < 1. We denote 
g{n,m,e) as the fundamental gain of the context memoriza- 
tion on the family of parametric sources on a sequence of 
length n using a context sequence of length m for a fraction 
1 — e of the sources, which is defined as follows: 



?(n,m, e) = sup{z : P[(5(/„,l„|, 

zGR 



> Z > 1 



=}■ (2) 



In other words, the fundamental gain of memorization is at 
least g(n, m, e) for a fraction 1 — e of the sources in the family. 

In [10], Beirami and Fekri studied the fundamental limits 
of compression without memory. In Q, they extended their 
study to the behavior of the memorization gain g{n, m, e) 
with respect to the sequence length and the memory size for 
different source models. In the rest of this paper, we investigate 
the effect of the context memorization gain on a network 
given that the memory-assisted compression gain g{n, m, e) 
(hereafter, referred as g) is known. 

III. Memory Deployment Problem in Networks 

In this section, we will investigate the memorization gain 
on a network. A network is represented by an undirected 
graph G{V, E) where V is the set of N nodes (vertices) and 
E ^ {uv : u,v E V} is the set of edges connecting nodes 
u and V. We consider a single source S which is the content 
server, and a set of memories = {/i^jfii chosen out of 
N nodes. The content server is assumed to be a parametric 
information source fT\. We assume that each client requests 
a small to moderate length size sequence from the content 
server. As discussed in Sec. Ull there is a fundamental limit 
beyond the entropy on the universal compression of a small 
to moderate length sequence. Therefore, UcompNmem will 
only be able to compress the content to a value which may be 
significantly larger than the entropy of the sequence. On the 
other hand, a memory nodes /ij is capable of memorizing the 
communication between the server and some client node Ci. 

To investigate the gain of memorization in the compression 
of the network flow, we must consider two phases. The first 
is the memorization phase in which we may assume all 
memory units have visited some sufficiently long sequence 
(or equivalently, a sufficient collection of small to moderate 
length sequences) from the source. This phase is realized in 
actual communication networks by observing the fact that a 
sufficient number of clients may have previously retrieved 
small to moderate length sequences from the server such that, 
via their routing, each of the memory units has been able 
to memorize the source. In the second phase, which is the 
subject of this section, we assume each node in the entire 
network may request (a small to moderate length) content 
from the server, uniformly. Our goal is to characterize the 
memorization gain in the compression of the sequences that 
are retrieved in the second phase. The above view simplifies 
our study as we are not concerned with the transition phase 
during which the memorization is taking place in the memory 
units. Hence, we can assume each memory unit will provide 
the same memory-assisted compression gain of g of the link 
from source to itself that depends on the length of the sequence 
that is being transmitted as well as the length of the sequence 
that is memorized in the memory unit, as described in Sec. HI] 

The goal of the memory deployment is to minimize the total 
cost of communication between the source and destinations 
in the network, measured by bit x hop, by deploying a set 
of memories fj,. We wish to study the behavior of the total 
savings in terms of bit x hop, as a function of the number of 
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Fig. 2. The shortest walk between the source and destination is not 
necessarily a path when we have memory in network. 



the memories, place of memories, size of memories, length of 
the sequences, and the information source model. 

In a network with source S and a set of destinations 
D = {D,}fL^, let fo be the flow destined to D £ B. 
The distance between any two nodes u and v is shown by 
d{u,v). The distance is measured as the number of hops in 
the shortest path between two nodes. As we will see later, 
introducing memories to the network will change the lowest 
cost paths from the source to destinations, as there is a gain 
associated with the S — portion of the path. Therefore, we 
have to modify paths accounting for the gain of memories. 
Accordingly, for each destination D, we define effective walk, 
denoted by Wd = {S, ui, . . . , D}, which is the ordered set 
of nodes in the modified (lowest cost) walk between the 
source and D. Then, we partition the set of destinations as 
D = Di UD2, where Di = {A : 3/1^. e Wd,} is the set of 
destinations observing a memory in their effective walk, and 
= argmin^gpij^^^^^ + d{ii,Di)}. The total flow T is 
then defined as 



ED,eD2fD,diS,Dj) 

(3) 

Using (|3]l, we define dn, called the effective distance from 
source to D, as 



d{S,D) 



d{^i 



15' 



D 



D 6 Di 
DE-D2 



(4) 



In short, the effective distance is the distance when memory- 
assisted source compression is performed and hence the gain 
g applies. By definition, d^i < d{S, D) VD. For simplicity, we 
assume fa — ^ for all destinations. Hence, T — J^ded ^d- 
In a general network where every node can be a client, we 
define a generalized network-level gain of memory deploy- 
ment as a function of memorization gain g, as follows: 



Sig) = 



(5) 



where J'q is the total flow in the network under UcompNmem, 
i.e., Jo = T,D&ud{S,D). 

In order to show the challenges of the memory deployment 
problem, we show as to how a single memory changes the 
effective paths in a network with a single source. Consider the 
network with the source node S placed as shown in Fig. |2] 



The destinations are nodes Ci, . . . , C4, and (7 = 4. The effec- 
tive walks from the source to destinations are obviously the 
shortest paths when there is no memorization (UcompNmem). 
As shown in the figure, the placement of memory changes 
the effective path to C2 while the shortest paths from the 
source to Ci, C3, and C4 are the same as the effective paths. 
Without memory, the shortest path to C2 is two hops long 
(S" ^ Ci — 7- C2), while the memory totally changes the 
effective walk distance to C2 to = | + 1 as depicted 
in the figure. 

In order to extend our study to analyze the achievable gains 
in general networks and study the behavior of Q{g), in the 
next section, we consider the memory deployment gain in the 
network graphs that resemble the ER random graph fTT]. The 
ER random graph is the building block of the recent models 
for complex graphs and hence the results would be useful in 
much broader contexts. We specifically direct our attention 
to connected random graphs since they better describe real 
networks. 

IV. Gain of Memory Deployment on the Network 
Flow Compression 

Definition 1: An ER random graph G{N,p) is an undi- 
rected, unweighted graph on N vertices where every two 
vertices are connected with probability p. 

Definition 2: Let u,v £ G be any two vertices. The 
diameter of a connected graph is defined as max„ „ w). 
Similarly, the average distance of a connected graph is defined 
as E[d(u, v)]. 

The following properties hold for ER random graphs: 



1) lfp< 



(l-£)logjV 

N 



, then G{N,p) almost surely (a.s.) has 



isolated vertices and thus disconnected. 

2) If p = ^^^^ for some constant c > 1, then G{N,p) 
is a.s. connected and every vertex asymptotically has 
degree clogA^ |12|. 

3) The diameter of G{N,p) is almost surely 1°^^ ■ 

4) The average distance in G{N,p), denoted by d, is 

rf-=(l + o(l))^^ 



log A^p 



(6) 



provided that 1°^^^ goes to infinity as A^ — )■ 00 (this 
condition is satisfied in the connected regime) ||T3l . 
Again, the main question is how Q{g) scales with M. 
In order to characterize the gain of memory placement, we 
consider connected G{N,p),p ~ with a single source 

node S and all other nodes as destinations. Since the expected 
degree of all nodes in ER graph is the same and every vertex 
is a destination with equal probability, we select memories 
{Mili^i uniformly and random. Theorem [T] below, provides 
the scaling of Qijj) with respect to M: 

Theorem 1: Suppose M is the number of deployed memo- 
ries in an ER random graph. Let e be a positive real number 

(a) If M = O f^^~'V then g{g) - l.Q 



'in this paper, we have used the following asymptotic notation: /(x') 
g(x) iff f{x)lg{x) ^ 1. 



(b) If M = VL (^Af 9+'^, then all the destinations benefit from 

memory and Q{g) ^d' ^z^^^. 

Sketch of the proof: We first find an upper bound on 
the number of destinations benefit form each memory. This 
upper bound is sufficient to derive part (a) of the theorem. 
For the second part, we find a lower bound on the number of 
benefiting destinations. ■ 

To characterize g{g), we first need to find Tq. The average 
distance from the source to a node is d. Thus, = Nd. For 
large A^, ® results in To ^ 

Next, we need to find T. For every memory /i we consider a 
neighbourhood Nr(/i) as shown in Fig. [3] This neighborhood 
consist of all vertices v within distance r from /i. We choose 
r such that, almost surely, all nodes in Nr(/i) would benefit 



from the memory /i. Clearly, if 



d{S,v), the 



benefit of the memory for node v vanishes and only nodes at 
distances less than r benefit from the memory fi. Given g, we 
denote this set of nodes benefiting from /i by 'Nr{fi,g)- 



v:^^^ + d{ii,v)<d{S,v) 



(7) 



Since memories are uniformly placed, the average value of 
d{S, fi) in djj is equal to d. Similarly, the average of d{S, v) 
is also d. Hence, solving for r in (|7]i and then using the result 
on the average distance in we conclude 



(1 - 



log AT 
log log A^ 



(8) 



The following lemma, by Chung and Lu |13|, gives an upper 
bound on the total number of vertices in the neighborhood 
Nr(/ij,g)|, where | • | is the set size operator 

Lemma 2 ( ^TTl): Assume a connected random graph. 
Then, for any e > 0, with probability at least 1 — (^y^^^^yi , 
we have |N^(^;,.g)| < (1 + 2e){Npy\ for 1 < r < \ogN . 

Using Lemma |2] and ([8]), we deduce that 



|N,.(//„5)| (l + 2e)(logA^)(^~9)(i5FfeiTv) 

= (l + 2e)7Vi-i/9. 



(9) 



Therefore, the total number of nodes gaining from the mem- 
ories is upper-bounded by X]f=i \^r{^j^■i^9)\ < M{1 + 
2e)N^^^^3 . As we will see, from (|9]l it is clear that the gain of 
memory vanishes if M is chosen small. The value N^^^ is the 
threshold value for the network-wide gain. More accurately, if 
M = O I^A^s^^^, there is no gain from memories. 

Proof of Theorem\I]a): For all the nodes in Nr(/ij,(7), 
we have a flow gain of g. Let M ~ N'S^'^, then we have 



Gig) 



< 



a.s. 

< 



Nd 



lM\Nri^i,g)\ + diN - M\Nri^i,g)\) 
N 



(10) 



A^- (1 - l/g)MN^^-li^ 
N 



(11) 



A^- (1 - l/.g)A^i- 



1, 




dfj.v <r \ 



Fig. 3. Memory Neighborhood 

where inequality in ( fTol i follows from the double counting 
of the destination nodes that may reside in more than one 
neighborhood. Also, ( fTTI ) follows from replacing (|9]l in ( fTOb . 

■ 

Since we need more than ns memory units to have a 
network- wide gain, the next question is as to how Q{g) scales 
when the number of memory units exceeds n 9 . To answer this 
question, we need to establish a lower-bound on the neighbor- 
hood size and the number of nodes benefiting from memory. 
Further, we have to account for the possible double counting of 
the intersection between the memory neighborhoods. We use 
the following concentration inequality from |13| to establish 
the desired bound. 

Proposition 3 ( fl3^): If Xi,X2, ■ ■ ■ , X„ are non-negative 
independent random variables, then the sum X = -^i 
holds the bound 

P[X<Em-A]<exp(-^^|^ 

This inequality will be helpful to show that the quantities of 
interest concentrate around their expected values. 

The following lemma provides a lower-bound on the neigh- 
borhood size |N,.(/i,5)| and the lower-bound on Q{g), as we 
show, is immediate. 

Lemma 4: Consider a set of vertices V of G{N,p) such 
that ^ — 0(1). For < e < 1, with probability at least 
1 _ g-Wp|y|eV2^ we have 



\Hr{^i.g)\ > {l~e){NpY 



(12) 



Proof: The vertex boundary of V, denoted by T{V), 
consists of all vertices in G adjacent to some vertex in V. 

T{V) — {u : u ^ V, and u is adjacent to v G V} . 

Let Xu be the indicator random variable that a vertex u is in 
r(V), i.e., P[X„ = 1] = P[u € T{V)]. Then, 

E[m|] = ^ E[xj = 5] P[« e r(y)] 

> p\V\{N-\V\) = il-oil))Np\V\(13) 



where the inequality in ( fT3] ) follows from 

P[u e T{V)] = 1 - (1 > 1 - e-fl^l 



:p\V\ 



and the second part holds because ^ = o(l). Since, X^s 
are non-negative independent random variables, by applying 
Proposition[3]with A = ^Q;E[|r(y)|], with probability at least 
1 — e^"/^ we have 



\nv)\ 



> 
> 



E[|r(F)|]- V«E[|r(y)|] 

{l-e)Np\V\. (14) 

By picking a single vertex and applying (fT4l i inductively r 
times, and then adding up the number of adjacent nodes, we 
obtain ([T2]l. ■ 

Now that we have a lower-bound on the number of nodes 
benefiting from each memory, we show that by increasing the 
number of memories beyond M = , memories cover all 
the nodes in the graph effectively and hence all the nodes 
would gain from the memory placement. 

In order to limit the intersection between the neighborhoods, 
we reduce r to rg as below: 

rs = {l-l/9-S)(,^]. (15) 



^ log log N J 

With this choice of r^, by lemmas |2] and ID we deduce 
that the probability that a random node w e G belongs to 
the neighborhood Nr^(/ij,.g) of the memory ^l.^ is X^^l^^^ . 
Hence, the expected number of the covered nodes is 

M 



E 



ueG 



u£G 

w N [mN-^/<>-^^ = N, (16) 



where ^ holds by choosing M N^/a+^. 

To show that the number of covered nodes is concen- 
trated around its mean, we use Prop. [3] again with A = 
V'aE[|UNr, Then, with probabihty at least l-e""/^ 



we have 



M 



> 



> 



E[|UN,,(/i„5) 
{l-o{\))N. 



Hence, the memories cover, almost surely, all of the nodes. 

Since all nodes are covered with high probability, we can 
associate each node with a neighborhood |Nr^(/J.j,g)l, for 
which nodes' distances in the neighborhood from memory are 

{l-o{l))rs. 

Proof of Theorem \l\b): By ( fT6l ). we can bound the 
network-wide gain of the memory from below. We have 



Q{9) 



Nd 



id/9 



-rs)N 
1 



l/g + {l-l/g-S) 1-5' 



(17) 
(18) 
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Fig. 4. Network-level gain Q versus logjY(M) for different network sizes 
N and g = 1.25. 
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where ( fTTj i holds because the distance of the nodes from 
memory is rs, asymptotically almost surely. ■ 
As the number of memories becomes close to N, i.e., 5 — s- 
(1 — i), the gain Q ^ g, s& expected. In the next section, we 
verify our result in memory-assisted source coding and the 
network-wide gain of memory via numerical simulations. 

V. Simulation Results 

In this section, we first demonstrate our theoretical results 
through an example. We consider the source to be a first-order 
Markov source with alphabet size equal to 256. In f?^, Beirami 
and Fekri derived a lower bound on the memorization gain as 
a function of the sequence length and the memory size. For 
example, they showed that g(512kB, 8MB, 0.05) is about 1.25, 
i.e., with a memory of 8MB, a gain of 1.25 is obtained on the 
memory-assisted compression of 512kB long sequences Q. 
Fig. |4] presents the simulation results for network-wide gain 
for different network sizes versus logjY(M) when g = 1.25. 
The rightmost solid curve is our theoretical result in ( [TtT i. For 
small values of M, the network-wide gain would be 1 for N 
oo, while for large A/, Q tends to g. Also, as N increases, 
simulation results approach the theoretical limit for both small 
and large values of M. 
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